Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CI and validation scripts #154

Merged
merged 1 commit into from
Apr 17, 2024
Merged

Fix CI and validation scripts #154

merged 1 commit into from
Apr 17, 2024

Conversation

guangy10
Copy link
Contributor

@guangy10 guangy10 commented Apr 12, 2024

Unified CI jobs and scripts for model/platform validation. Will clean up other duplicate jobs in separate PRs.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 12, 2024
@guangy10 guangy10 marked this pull request as draft April 12, 2024 05:06
@guangy10 guangy10 force-pushed the fix_validation_scripts branch 20 times, most recently from f70289e to d8b9e5a Compare April 12, 2024 07:52
@mikekgfb mikekgfb self-requested a review April 12, 2024 16:42
@guangy10 guangy10 force-pushed the fix_validation_scripts branch 2 times, most recently from e879af8 to c0f228b Compare April 15, 2024 23:06
@guangy10 guangy10 requested a review from huydhn April 15, 2024 23:30
@guangy10 guangy10 force-pushed the fix_validation_scripts branch 11 times, most recently from fe73053 to 9d5a347 Compare April 17, 2024 01:52
@guangy10 guangy10 force-pushed the fix_validation_scripts branch from 9d5a347 to fbab1e9 Compare April 17, 2024 02:03
@guangy10
Copy link
Contributor Author

guangy10 commented Apr 17, 2024

There are failures in the PR (cuda + compile) but not relevant because I also see the same failure on other cuda jobs and also on other PRs.

The periodic job also has failures which are not caused by the PR itself.

  • All the cuda failures are due to CUDA OOM. cc: @mikekgfb @malfet
  • The AOTI error on aarch64 is known and tracked in T185486782.
  • [New] The compile error on aarch64 seems to be a new issue, which worked just fine last Fri according to the validation tracker: https://fburl.com/gsheet/y52pwfl8. This seems to be a regression. cc: @mikekgfb

I think we should run one large model on PR. The data points shows that it won’t take too long. If you take a look success jobs: https://github.com/pytorch/torchchat/actions/runs/8715200573/job/23906715589, they are taking about 4min-6min, which is okay running on PR.

@guangy10 guangy10 merged commit ceeef3e into main Apr 17, 2024
21 of 37 checks passed
@guangy10 guangy10 deleted the fix_validation_scripts branch April 17, 2024 02:56
metascroy pushed a commit that referenced this pull request Apr 17, 2024
metascroy added a commit that referenced this pull request Apr 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
malfet pushed a commit that referenced this pull request Jul 17, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (#154)

* missing device (#232)

* Use generator args to group all arguments to generator (#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (#236)

* remove redundancy & remove int4 linear test from ET tests (#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/periodic CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants